Probabilistic sequence models for image sequence processing and recognition
نویسنده
چکیده
This PhD thesis investigates the image sequence labeling problems optical character recognition (OCR), object tracking, and automatic sign language recognition (ASLR). To address these problems we investigate which concepts and ideas can be adopted from speech recognition to these problems. For each of these tasks we propose an approach that is centered around the approaches known from speech recognition and adapted to the problem at hand. In particular, we describe our hidden Markov model (HMM) based image sequence recognition system which has been adopted from a large vocabulary continuous speech recognition (LVCSR) framework and extended for tasks. For OCR, we present our RWTH Aachen University Optical Character Recognition (RWTH OCR) system, which has been developed within the scope of this thesis work. We analyze simple appearance-based features in combination with complex training algorithms. Detailed discussions about discriminative features, discriminative training, and a novel discriminative confidence-based unsupervised adaption approach are presented. In automatic sign language recognition (ASLR), we adapt the RWTH Aachen University Speech Recognition (RWTH ASR) framework to account for multiple modalities important in sign language communication, e.g. hand configuration, place of articulation, hand movement, and hand orientation. Additionally, non-manual components like facial expression and body posture are analyzed. Most sign language relevant features require a robust tracking method. We propose a multi purpose model-free object tracking framework which is based on dynamic programming (DP), and which is applied to hand and head tracking tasks in automatic sign language recognition (ASLR). In particular, a context-dependent tracking decision optimization over time allows to robustly track occluded objects. The algorithm is inspired by the time alignment algorithm in speech recognition, which guarantees to find the optimal path w.r.t. a given criterion and prevents taking possibly wrong local decisions. All results in this work are either evaluated on standard benchmark databases, or on novel publicly available databases generated within the scope of this thesis work. Our optical character recognition (OCR) system is evaluated on various handwritten benchmark databases and for multiple languages. Additionally, a novel Arabic machine printed newspaper database is presented and used for evaluation. Our dynamic programming tracking (DPT) framework and its different algorithms are evaluated for head and hand tracking in sign languages on more than 120,000 frames of annotated ground-truth data. The ASLR system is evaluated for multiple sign languages, such as American Sign Language (ASL), Deutsche Gebärdensprache (DGS), and Nederlandse Gebaren Taal (NGT), on databases of different visual complexity. In all cases highly competitive results can be achieved, partly outperforming all other approaches known from literature.
منابع مشابه
Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملImage Segmentation using Gaussian Mixture Model
Abstract: Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we used Gaussian mixture model to the pixels of an image. The parameters of the model were estimated by EM-algorithm. In addition pixel labeling corresponded to each pixel of true image was made by Bayes rule. In fact,...
متن کاملIMAGE SEGMENTATION USING GAUSSIAN MIXTURE MODEL
Stochastic models such as mixture models, graphical models, Markov random fields and hidden Markov models have key role in probabilistic data analysis. In this paper, we have learned Gaussian mixture model to the pixels of an image. The parameters of the model have estimated by EM-algorithm. In addition pixel labeling corresponded to each pixel of true image is made by Bayes rule. In fact, ...
متن کاملSeismic Data Forecasting: A Sequence Prediction or a Sequence Recognition Task
In this paper, we have tried to predict earthquake events in a cluster of seismic data on pacific ring of fire, using multivariate adaptive regression splines (MARS). The model is employed as either a predictor for a sequence prediction task, or a binary classifier for a sequence recognition problem, which could alternatively help to predict an event. Here, we explain that sequence prediction/r...
متن کاملA new approach to video sequence recognition based on statistical methods
In this paper a fast method for image sequence recognition is presented. The method is based on a discrete statistical model consisting of a vector quantizer and a special probabilistic neural network, which allows to classify image sequences without applying rules depending on the content of the sequence. The simple feature extraction also allows the classification with discrete Hidden Markov ...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کامل